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To investigate whether treating cancer patients with erythropoiesis- 
stimulating agents (ESAs) would increase the mortality risk, Bennett 
et al. [Journal of the American Medical Association 299 (2008) 914- 
924] conducted a meta-analysis with the data from 52 phase III trials 
comparing ESAs with placebo or standard of care. With a standard 
parametric random effects modeling approach, the study concluded 
that ESA administration was significantly associated with increased 
average mortality risk. In this article we present a simple nonparamet- 
ric inference procedure for the distribution of the random effects. We 
re-analyzed the ESA mortality data with the new method. Our results 
about the center of the random effects distribution were markedly dif- 
ferent from those reported by Bennett et al. Moreover, our procedure, 
which estimates the distribution of the random effects, as opposed 
to just a simple population average, suggests that the ESA may be 
beneficial to mortality for approximately a quarter of the study pop- 
ulations. This new meta-analysis technique can be implemented with 
study-level summary statistics. In contrast to existing methods for 
parametric random effects models, the validity of our proposal does 
not require the number of studies involved to be large. From the re- 
sults of an extensive numerical study, we find that the new procedure 
performs well even with moderate individual study sample sizes. 

1. Introduction. Conventional meta-analysis techniques have been uti- 
lized frequently to make inferences about a single parameter, for example, 
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the center of the distribution of the random or fixed effects. Under the ran- 
dom effects model, the procedure for estimating the mean of the random 
effects proposed by DerSimonian and Laird (DL) (1986) is routinely used in 
practice. Their method utilizes a linear combination of study-specific point 
estimates with the weights depending on the within- and among-study vari- 
ance estimates. This procedure is simple to implement and does not require 
patient-level data. Its validity, however, depends heavily on the individual 
study sample sizes and the number of studies [Brockwell and Gordon (2001), 
Bohning et al. (2002), Sidik and Jonkman (2007) and Viechtbauer (2007)]. 
In addition, this and other related methods for random effects models in 
meta-analysis do not provide inferences about the distribution function of 
the random effects. Estimation of this distribution function or its quantile 
counterpart provides valuable information for the complex risk-benefit deci- 
sion on a new drug or device. 

In a meta-analysis using the data from 52 phase III comparative tri- 
als (ESA vs. placebo or standard of care), Bennett et al. (2008) examined 
whether the erythropoiesis-stimulating agents (ESAs) for treating anemia 
of cancer patients would increase the patients' risk of mortality. The point 
and 95% interval estimates of two-sample study-specific hazard ratio were 
presented in Figure 2 of Bennett et al. Bennett et al. (2008) concluded that 
administration of ESAs was significantly associated with increased mortality. 
Using the DL method, the resulting 95% confidence interval for the mean of 
the random hazard ratios (treated vs. untreated with ESA) across the studies 
was (1.01, 1.20). The lower bound of the interval is barely above 1. Further- 
more, it is known that the DL method can produce liberal confidence interval 
estimates, that is, the true coverage level tends to be smaller (sometimes sub- 
stantially) than the nominal value [Emerson, Hoaglin and Mosteller (1993), 
Hardy and Thompson (1996), Brockwell and Gordon (2001, 2007) and Sidik 
and Jonkman (2002)]. Therefore, the interval estimates reported by Bennett 
et al. may be "too tight." Moreover, from Figure 2 of Bennett et al., it ap- 
pears that the study-specific hazard ratio estimates for 22 out of 52 trials 
are less than 1, suggesting that even if the average hazard ratio is more 
than 1, the ESA may not be harmful in all study populations. Last, since 
the DL method is based on a weighted average of hazard ratio estimates, 
the resulting interval estimates may be sensitive to outliers. 

In this article we propose a simple inference procedure for the percentiles 
of the random effects distribution based on study-level data without assum- 
ing a parametric form of the distribution. We re-analyzed the mortality data 
reported in Bennett et al. (2008). The resulting 95% confidence interval for 
the median of the random hazard ratios was (0.94, 1.26). The 95% confi- 
dence interval for the lower quartile of the random hazard ratios was (0.70, 
0.99), indicating that, in approximately a quarter of the study populations, 
ESA treatment may reduce mortality. In contrast to all existing methods, 
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which can only handle inference for the center of the random effects distri- 
bution, the new proposal does not require the number of studies to be large. 
The new proposal is theoretically valid when the sample sizes of individ- 
ual studies are large. Through an extensive numerical study, we find that 
the new method performs well even with moderate individual study sample 
sizes. On the other hand, the DL method tends to give liberal confidence 
interval estimators, that is, their coverage levels can be markedly smaller 
than the nominal value. 

2. Interval estimates for percentiles of the random effects distribution. 

Consider a typical two-level hierarchical model. Let II' = (G, A') be a row 
vector of random parameters, where is a univariate parameter of inter- 
est and A is a finite- or infinite-dimensional vector of nuisance parameters. 
Let G(-) be the continuous, completely unspecified distribution function of 
G. Given an unobservable realization II, a data set X is generated. Let 
{IlfcjXfc}, k = 1, . . . , K, be K independent copies of {II, X}. The problem 
is how to make inferences, for instance, about the median \x of G(-) with 
{X k , k = 1, . . . , K}. As an example, consider the case with K 2 x 2 tables 
and let be the log-risk-ratio or risk difference for the kih table. Here, the 
nuisance parameter consists of the underlying event rate for the "control" 
group and the sample size for the feth study n^. 

If we can observe {G&, k = 1, . . . , K}, a simple nonparametric estimator 
for [i is the sample median. Exact confidence intervals for \i can be obtained 
by inverting a sign test for the null hypothesis that the median is mo- Under 
Ho: n = fiQ, consider 

K 

(1) rGu ) = 5> fe , 

k=l 

where B k = I(@k < Mo) — I(®k > Mo) and /(•) is the indicator function. The 
null distribution of T(mo) can be generated by 

(2) T* = Y A k where A fc = ( \ wfth probability 0.5, 
w ^ 1-1, otherwise. 

i=i y 

Suppose that, given 11^, G^ is a consistent estimator for based on the 
data Afc. To test Hq, one may replace G& in (1) with Q^. This results in the 
test statistic 

K K 

(3) f(no) = ]T B k = ^{/(G fe < mo) - I(Qk > Mo)}- 

k=l k=l 

When the sample size n k for each individual study is large, we can make 
inferences about the median by comparing the observed value of (3) to the 
distribution of (2). 
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Now, the test based on (3) does not take into account the precision of the 
estimator Q k . It gives equal weight to each individual study. For the A;th 
study, suppose that the variance a\ of Q k is large relative to the distance 
between 0^ and ^o- Then the likelihood of the unobservable Q k < fiQ can be 
quite close to 1/2 (like tossing a fair coin). Therefore, the noise generated 
from such an unstable B k may well outweigh its added value to the power 
of the test based on T(fjLo). On the other hand, if a\ is small and @ k < /Uq, 
the likelihood of &k < Mo would be closer to 1. 

This motivates us to modify test statistic (3) by putting weight w k on B k . 
Here, w k is a measure of likelihood of the event Q k < fj,o, for example, the ob- 
served coverage level of the interval (— oo,/io) for the realized 0^. When the 
individual study size n k is large, and the distribution of 0^ conditional on 
Ilfc is approximately normal with mean 0^ and variance ai, where n k a\ con- 
verges to a constant, this coverage level is approximately 3>((/io — @k)/&k), 
where <I> is the distribution function of the standard normal. Let the resulting 
test statistic be 

K 

(4) f(fi ) = £>((/*) - e fe )/<7 fe ) - l/2\B k . 

k=l 

In the Appendix we show that, in probability, for any given //, 

(5) \$((n-@ k )/a k )-l/2\B k -B k /2^0 as n k -> oo. 

It follows that, for fixed K, for large n k ,k = 1,...,K, the distribution of 
T(/io) approximates that of T(/io). This approximation, however, is rather 
discrete; and for moderate sample sizes, the resulting confidence intervals 
for fj, do not have adequate coverage levels in our numerical study (Section 
4). An alternative way to generate an approximation to the null distribution 
of T(no) is to use 

K 

(6) f*(jj>o) = £>((/x - ©*)/**) " l/2|A fc . 

k=l 

Here, the A^'s are the only random quantities and are analogous to the 
random multipliers used in the wild bootstrap [Wu (1986)]. The weight 
from the kih study is multiplied by A k , which is 1 or —1 with probability 
0.5 and is generated by the analyst independently of the observed data. In 
the Appendix, we also justify the asymptotic validity of the test based on 
(4) and (6). Confidence intervals for fx can be obtained by inverting this 
test. In contrast to other methods, the new proposal does not require the 
number of studies (K) to be large. In Section 4 we show empirically that 
the new interval estimation procedure performs well even when the sample 
sizes (n k ) are not large. 
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The above proposal can be generalized easily to make inferences about 
certain percentiles of the distribution G(-). Specifically, let us hypothesize 
that the lOOpth percentile is uq. As for the median, define = I(@k < 
Ho) — I(®k > ^0)5 and obtain Bk by replacing 0^ in Bk with The test 
statistic is given by 

K 

(7) f p (jio) = £>«Mo - ®k)/tk) ~ l/2\B k , 

k=l 

and the null distribution is generated by 

K 

(8) r;(Mo) = £>((Mo - e fc )/(7 fc ) - l/2|A fc , 

fe=l 

where A& = 1 with probability p and = — 1 with probability 1 — p. Let the 
resulting test statistic corresponding to (3) be denoted by T p (hq). Confidence 
intervals for the lOOpth percentile can then be obtained by inverting the 
conditional test accordingly. 

3. Safety meta-analysis of erythropoiesis-stimulating agents. We re-ana- 
lyzed the data reported in Bennett et al. (2008) using the new proposal. Here 
K = 52, and for the foth study, 0^ was the log-hazard ratio and 0& was its 
estimate. Since the patient-level data were not available, we approximated 
the standard error estimate of 0^ by one-fourth of the reported length of 
the 95% confidence interval (converted to the log-scale). The 95% confi- 
dence interval for the median of the distribution of the random hazard ratio 
(exp(O)) was (0.94,1.21) based on the test statistic T(-) and (6). The cor- 
responding interval based on the indicator functions {I(®k < A 4 )} y i a T(-) 
was (0.90,1.26), which was wider than the above interval. The 95% confi- 
dence interval for the mean of the random effects distribution reported in 
Bennett et al. (2008) using the DL method was (1.01,1.20). In the next 
section we show that the empirical coverage levels of the DL method can be 
substantially lower than their nominal counterparts even when the number 
of studies is not that small (say, K = 40). 

The 95% intervals for the 25th and 75th percentiles based on (7) and (8) 
were (0.70, 0.99) and (1.18, 1.48), respectively. The counterparts based on 
T p (-) were (0.49, 0.93) and (1.25, 1.72). Again, the intervals based on T p (-) 
were shorter than those with T p {-). Note that the upper bound of the 95% 
interval for the 25th percentile was smaller than 1, which suggested that, 
approximately, for a quarter of the study populations, their average hazard 
ratios for the ESA versus the control were most likely less than one. That is, 
on average, the patients in these study populations may benefit from taking 
ESA with respect to mortality. 
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Further investigation to identify characteristics of these trials would be 
informative for identifying future cancer patients who would benefit from 
the ESAs through reduction of blood cell transfusions and improved quality 
of life. On the other hand, it is crucial to identify future patients who would 
have unacceptable toxicity risks. 

Bennett et al. (2008) also separately evaluated cancer-related anemia with 
six studies (see the top portion of Figure 2 in Bennett et al.) and investigated 
whether ESAs would increase the risk of a venous thromboembolism event 
(VTE) from 38 comparative phase III trials. The results obtained using the 
new proposal are reported in the supplemental article [Wang et al. (2009)]. 

4. Numerical studies to evaluate performance of the new proposal. We 

conducted extensive numerical studies to examine the performance of the 
proposed interval estimation procedure for the percentiles of the random 
effects model under various practical settings. The existing random effects 
methods for meta-analysis have focused on making inferences about the 
mean of the random effects distribution. To the best of our knowledge, no 
other methods address the same issue as our proposed procedure does. Our 
numerical studies included the DL interval estimation method, the method 
proposed by Sidik and Jonkman (2002) (SJ), and the one based on T(-) 
for comparisons. We considered cases with binary or continuous responses, 
various symmetric or asymmetric random effects distributions, and a wide 
range of study sample sizes and number of studies. From the results of our 
numerical investigation, we find that the new proposal performs well with 
respect to the confidence interval coverage level and length. The DL (or 
SJ) method tends to be liberal, that is, the empirical coverage levels can be 
markedly lower than their nominal counterparts. The procedure based on 
the test statistic T(-) produces confidence intervals whose average lengths 
are uniformly wider than those with our method. For percentiles other than 
the median, the method based on T p {-) may have under-coverage. 

Specifically, in our numerical studies, we first considered meta-analysis 
for multiple 2x2 tables under settings similar to the meta-analysis of VTE 
rates in Figure 3 of Bennett et al. (2008). There are 41 studies listed and 
the raw data are available for 40 studies. We let = log(Pifc/Pofc) be the 
log-relative risk for the fcth study, where P\k and Pqu ar e the underlying 
event rates for the ESA and control groups, respectively. We then assumed 
that the random vectors (logit(Pofe)> ^ogit(Pik))' were a random sample of 
size K from a bivariate normal, whose mean r\ and variance-covariance ma- 
trix E were estimated by their sample counterparts via the observed rates 
in Figure 3 of Bennett et al. (2008). We used the conventional 0.5 con- 
tinuity correction for studies with zero cells. The resulting sample means 
and variance-covariance matrix are (—3.56,-2.86)' and (q^ > respec- 

tively The density of O is given in Figure 1 [panel (a)], which appears to 



PROCEDURE FOR RANDOM EFFECTS META-ANALYSIS 7 
(a) Logit-Normal (b) Bivariate Beta 




Fig. 1. The true density functions for the random log-relative-risk parameter for the 
simulation study. 

be quite symmetric. For each realization {(Pok, P\k)' ■, k = l, . . . , K}, we gen- 
erated the corresponding set of 2 x 2 tables. We then used DL, SJ, T(-) 
and T(-) to construct 95% confidence intervals for the median of the distri- 
bution of 0. For each realized data set, we excluded studies with 0-0 cells 
(that is, no events occurred in either group), and used the 0.5 continuity 
correction for studies with one zero cell. The average empirical coverage lev- 
els and the median interval lengths were obtained from 2000 realized data 
sets. 

Under the same setting, we repeated this process with K = 40, 30, 20, 
10 and 6. For each K, the sample sizes came from the first K studies listed 
in Figure 3 of Bennett et al. (2008). The results are summarized in Table 
1 (top half). The average coverage levels for our proposed method, T(-), 
range from 0.94 to 0.95. On the other hand, the average empirical coverage 
level can be as low as 0.86 for the DL method, and 0.88 for the SJ method. 
The median lengths of the intervals obtained via T(-) are uniformly smaller 
than those of the procedure using T(-). In Table 2 (top half), we report the 
results for the 25th and 75th percentiles. Again our proposal behaves well, 
but the one with T p {-) may not have the correct coverage level. 

We also considered rather asymmetric random effects distributions. For 
example, we considered a bivariate beta distribution for {(Pok, Pik)\ k = 
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Table 1 

Empirical coverage levels (ECL) and median lengths (ML) of 95% interval estimates for 
median based on DerSimonian-Laird (DL), Sidik and Jonkman (SJ), T(-) andT(-) with 
a bivariate logit-normal or a bivariate beta distribution for the two underlying random 

event rates 



Number of DL SJ T U T(0 

studies, K ECL ML ECL ML ECL ML ECL ML 



Bivariate logit-normal 



40 


86% 


0.62 


88% 


0.65 


94% 


0.72 


95% 


0.90 


30 


88% 


0.71 


91% 


0.75 


94% 


0.83 


95% 


1.03 


20 


88% 


0.85 


91% 


0.90 


94% 


1.00 


95% 


1.23 


10 


88% 


1.18 


94% 


1.36 


95% 


1.54 


97% 


2.15 


6 


91% 


1.57 


97% 


2.06 


95% 


2.29 


97% 


2.89 










Bivariate beta 








40 


87% 


0.40 


89% 


0.42 


95% 


0.52 


96% 


0.65 


30 


88% 


0.46 


90% 


0.48 


95% 


0.61 


96% 


0.75 


20 


90% 


0.55 


92% 


0.59 


96% 


0.75 


96% 


0.91 


10 


91% 


0.76 


93% 


0.89 


96% 


1.10 


98% 


1.56 


6 


88% 


1.00 


94% 


1.30 


95% 


1.58 


97% 


2.10 



1, . . . ,40} via three independent gamma random variables with a common 
unit scale parameter and shape parameters of 2, 8 and 10, respectively [Olkin 
and Liu (2003)]. The resulting density function of the random parameter 
0, the log-relative risk, is given in Figure 1 [panel (b)]. Under the same 
setting as the previous simulation, the results are reported in the bottom 
half portions of Tables 1 and 2. Again, the new procedure performs well. The 
DL (or SJ) method still has coverage problems. Although the DL method 
produces confidence interval estimates for the mean of G(-), not the median, 
its empirical coverage for the mean was also lower than the nominal 95%. 
For example, when K = 40, the coverage of DL for the mean was only 64%. 

Although our method assumes that the random effects distribution is con- 
tinuous, we also considered cases with fixed effects models in our numerical 
study. For example, we let (-Fbfcj-Fifc) = (0.1, 0.2), k = 1, ...,K. The results 
are summarized in Table 3. For this case, the DL method has correct cover- 
age level for most scenarios under which our interval estimation procedure is 
comparable with the DL method with respect to efficiency, which is reflected 
in the interval length. We also studied the performance of our method for 
Ofc = P\k — Pok, the risk difference for the kth study. The results were very 
similar to those for the relative risk. 

Our numerical studies with continuous responses yielded similar results. 
We summarize the study settings and the results in the supplemental article 
[Wang et al. (2009)]. We expect similar results for censored time to event ob- 
servations, where hazard ratios are used for treatment effect measurements. 
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Table 2 

Empirical coverage levels (ECL) and median lengths (ML) of 95% confidence intervals 
for the 25th and 75th percentiles based on T p (-) and T p (-) with a bivariate logit-normal or 
a bivariate beta distribution for the two underlying random event rates 



25th percentile 75 percentile 

Nnmberof j^j ^ 



studies, K ECL ML ECL ML ECL ML ECL ML 



Bivariate logit-normal 



40 


95% 


0.86 


86% 


1.16 


95% 


0.81 


92% 


0.92 


35 


96% 


0.91 


88% 


1.21 


96% 


0.86 


90% 


1.02 


30 


96% 


1.00 


90% 


1.37 


96% 


0.94 


91% 


1.12 


25 


96% 


1.12 


90% 


1.49 


97% 


1.06 


92% 


1.23 


20 


96% 


1.24 


92% 


1.52 


97% 


1.16 


92% 


1.32 










Bivariate beta 








40 


96% 


0.48 


93% 


0.55 


96% 


0.73 


92% 


0.96 


35 


96% 


0.52 


95% 


0.61 


96% 


0.78 


93% 


1.04 


30 


95% 


0.56 


94% 


0.64 


96% 


0.85 


93% 


1.07 


25 


96% 


0.62 


93% 


0.65 


96% 


0.94 


92% 


1.10 


20 


96% 


0.72 


95% 


0.80 


96% 


1.37 


95% 


1.37 



Table 3 

Empirical coverage levels (ECL) and median lengths (ML) of 95% interval estimates for 
median based on DerSimonian-Laird (DL), T(-) and T(-) under a fixed effect model (the 
underlying event rates are 0.1 and 0.2) 



studies, K 


ECL 


ML 


ECL 


ML 


ECL 


ML 


40 


92% 


0.24 


95% 


0.27 


96% 


0.35 


30 


94% 


0.26 


95% 


0.30 


96% 


0.39 


20 


95% 


0.30 


95% 


0.35 


97% 


0.45 


10 


97% 


0.47 


96% 


0.57 


98% 


0.84 


6 


96% 


0.75 


95% 


1.03 


97% 


1.34 



5. Discussion. In this article we present a simple nonparametric interval 
estimation procedure for percentiles of the random effects distribution. Ran- 
dom effects meta-analysis is frequently employed in medical research. How- 
ever, the validity of the most popular method (DL) and its variations [Hardy 
and Thompson (1996), Biggerstaff and Tweedie (1997), Hartung (1999), 
Hartung and Knapp (2001a, 2001b) and DerSimonian and Kacker (2007)] is 
not clear when the number of studies is not large or the parametric assump- 
tion for the random effects is violated. An excellent review on meta-analysis 
with the random effects model is given by Sutton and Higgins (2008). In 
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contrast to previous methods, our proposal does not require the number of 
studies to be large. The new proposal is valid provided the individual study 
sample sizes are large. 

In addition, if the random effects distribution is symmetric and the exact 
distribution of 0^, k = 1, . . . ,K, conditional on 11/%, is symmetric around the 
unknown fixed realized it is easy to show that the resulting interval es- 
timators based on T(-) for the median (or mean) are valid without requiring 
the sizes of the individual studies or the number of studies to be large. For 
instance, under the usual two-sample location shift model with continuous 
response variable, let be the location shift parameter of interest. Then, 
the two-sample rank estimator is symmetric around under rather mild 
conditions [Lehmann (1975), page 86]. If the unspecified random effects dis- 
tribution is symmetric around /x, one can use our procedure to obtain exact 
confidence intervals for fj,. To examine the performance of the method in 
this setting, we conducted a simulation study, described in detail in the 
supplemental article [Wang et al. (2009)]. 

The proposed procedure can be implemented with study level summary 
statistics. When patient level data are available, various novel procedures 
have been studied for mixed effects regression models for continuous, dis- 
crete or censored event time observations [Laird and Ware (1982), Hougaard 
(1995), Hogan and Laird (1997), Henderson, Diggle and Dobson (2000), 
Lam, Lee and Leung (2002), Nelder, Lee and Pawitan (2006), Cai, Cheng 
and Wei (2002), Zeng and Lin (2007) and Zeng, Lin and Lin (2008)]. To the 
best of our knowledge, all of the existing asymptotic procedures for mixed 
effects models assume that the number of studies is large. 

In the current practice of meta-analysis, inferences are made only for the 
"center" of the random effects distribution. A conclusion on the risk or ben- 
efit from an intervention based solely on an estimated center of the random 
effects distribution provides limited information and is usually not sufficient. 
If the number of studies involved is not small, we highly recommend esti- 
mating this distribution or its percentiles as proposed in this article. 

Under the fixed effects model, this distribution has a single unknown 
mass point. The standard estimation procedure for such a fixed parameter 
value utilizes a weighted average of study-specific point estimates. For an- 
alyzing multiple 2x2 tables, the most commonly used procedures are the 
Mantel-Haenszel [Mantel and Haenszel (1959)] and Peto methods [Yusuf et 
al. (1985)]. These methods are valid when the number of studies and each 
individual study sample size are large. Moreover, when the event rate is 
small, these standard methods may not perform well. For the fixed effects 
model, Tian et al. (2009) proposed a general exact interval estimation proce- 
dure that combines study-specific exact confidence intervals instead of point 
estimates. If the fixed effects model is approximately correct, the existing 
interval procedures for the common parameter value \i may be more efficient 
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than those developed under the random effects model. The standard hetero- 
geneity tests generally do not have the power to detect violations of the fixed 
effects modeling assumption. Therefore, in practice, sensitivity analyses with 
both random and fixed effects models are highly recommended. 

APPENDIX: JUSTIFICATION FOR THE CONDITIONAL TEST f (•) 
BASED ON THE APPROXIMATION GENERATED BY f *(•) 

Let D k = |$(0 - 0fc)/<T fc ) - l/2\B k - B k /2. We show that D k goes to 0, 
in probability, as n k — > oo. Here, the probability is generated by the ran- 
dom element (X k ,H k ). For any fixed positive constant c, first we show that 
prd-Dfcl > c\H k ) — > for any given with Q k ^ /i. To this end, consider two 
cases. First, if ©^ < then conditional on n^, 

\D k \ = !<&((// - & k )/a k ) - 1| = 1 - - Q k )/a k + (© fe - Q k )/a k ). 

As n k — > oo, (/U — Q k )/(T k — > oo in probability, and (Q k — Q k )/a k — s> N(0, 1) 
in distribution. Therefore, for any c > 0, we can find N such that, when 
n k > N, pr((/i - Q k )/a k + - @ k )/a k < - c)) < c, which is equiva- 

lent to pr($((/x - Q k )/a k ) < 1 - c) = pr(|D fe | > c) < c. Therefore, pr(|D fc j > 
c | n^) — > 0. Similarly, if ©^ > fj,, we can show that pr(|Z)fe| > c | n^) — > as 
n k — > oo. Therefore, pr(|Dj.| > c | H k ) — > for any such that Q k ^ \i. 

This, coupled with the fact that G(-) is continuous, implies that pr(|Dfe| > 
c) = En fe {pr(|Dfc| > c | n^)} — > for any c by the dominated convergence 
theorem. Therefore, D k — > in probability as n k — > oo. It follows that \T(fi) — 

Ylk=l Bk/2\ — > 0, in probability, as min{ni, . . . ,nx} — > oo. 
Similarly, since 

||$((/i - Q k )/a k ) - l/2\A k - \I(Q k <fi)- l/2\A k \ < \D k \, 

one can show that T*(fi) - Y.k=i l J ( e fc < A») ~ V 2 ! A fc -> 0, in probability, 
as min{ni, . . . , uk} — > oo, where 

^ _ ( 1, with probability p, 
k \ —1, with probability 1 — p, 

for the lOOpth percentile and is independent of the data. Therefore, for any 
t and positive c, 

P r {(x fe ,n fe ) fc =i,...,x} 
<c, 

when minjni, . . . , uk} is large. This, coupled with the fact that Y2k=i Bk/2 ~ 
Ylk=i under the null hypothesis that the lOOpth percentile of Q k is fi, 
implies that one can approximate the null distribution of T(m) by the dis- 
tribution of T*(n) conditional on the observed data. 



pr(T*(^) < t\(X k ,Il k ) k=lj ^ K ) - pr 



J2^k/2<t\ 

\k=l J 



> C 
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SUPPLEMENTARY MATERIAL 

Additional examples, simulation results and computer codes (DOI: 
10.1214/09-AOAS280SUPP; .pdf). We present the results for the mortal- 
ity data set restricted to the six trials for anemia of cancer and the results 
for the venous thromboembolism rates data set in Bennett et al. (2008) 
using the proposed approach, report the simulation results for continuous 
responses and for the setting where the sample sizes for individual studies are 
small, and provide R codes for implementation of the proposed procedure. 
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